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Abstract — We consider the linear regression problem of esti- 
mating an unknown, deterministic parameter vector based on 
measurements corrupted by colored Gaussian noise. We present 
and analyze blind minimax estimators (BMEs), which consist of a 
bounded parameter set minimax estimator, whose parameter set 
is itself estimated from measurements. Thus, one does not require 
any prior assumption or knowledge, and the proposed estimator 
can be applied to any linear regression problem. We demonstrate 
analytically that the BMEs strictly dominate the least-squares 
estimator, i.e., they achieve lower mean-squared error for any 
value of the parameter vector. Both Stein's estimator and its 
positive-part correction can be derived within the blind minimax 
framework. Furthermore, our approach can be readily extended 
to a wider class of estimation problems than Stein's estimator, 
which is defined only for white noise and non-transformed 
measurements. We show through simulations that the BMEs 
generally outperform previous extensions of Stein's technique. 

Keywords: Linear regression model, biased estimation, min- 
imax estimation, James-Stein estimation 

I. Introduction 

The problem of estimating a parameter vector from noisy 
measurements has countless applications in science and en- 
gineering. Such estimation problems are typically modeled 
either in a Bayesian setting, in which a prior distribution 
on the parameter is assumed, or in a deterministic setting, 
in which no prior is assumed [1]. This paper examines the 
deterministic estimation problem. We further assume that the 
measurements y = tlx + w are linear combinations of the 
parameter vector x, to which Gaussian noise w is added. 
Here the transformation matrix H and the noise covariance 
are assumed to be known. We seek an estimate x which 
approximates x in the sense of minimal mean-squared error 
(MSE). 

This ubiquitous problem was first addressed by Gauss [2] 
and Legendre [3], who proposed the classical least-squares 
(LS) estimator. Several lines of reasoning can be used to sup- 
port the LS approach. One argument is that the LS estimator 
minimizes the squared error between the measurements y and 
the transformed estimate y = tlx. The LS estimator is also 
the maximum likelihood solution for Gaussian noise. However, 
neither of these criteria are directly related to the MSE, or to 
any other measure of the distance between x and x. Another 
property of the LS solution is that it is the unbiased estimator 
achieving minimal MSE. Yet by removing the requirement 
of unbiasedness, estimators yielding lower MSE can be con- 
structed. While linearity and unbiasedness may be intuitively 

The authors are with the Dept. of Electrical Engineering, Technion — Israel 
Institute of Technology, Haifa 32000, Israel. E-mail: zvikabh@technion.ac.il, 
yonina@ee.technion.ac.il; phone +972-4-829-4700; fax +972-4-829-5757. 
This work was supported by the Israel Science Foundation under Grant No. 
536/04. 



appealing properties, they have no relation to the primary goal 
at hand, namely, achieving low estimation error. Indeed, there 
are many examples in which the requirement of unbiasedness 
results in absurd estimators [4]. 

Because the parameter vector x is deterministic, the MSE 
i?{||a; — x || 2 } is generally a function of x. In other words, 
one method may be better than another for some values of x, 
and worse for other values. For instance, the trivial estimator 
x = achieves optimal MSE when x = 0, but its performance 
is otherwise poor. Nonetheless, it is possible to impose a 
partial order among estimation techniques [5], as follows. An 
estimator X\ is said to strictly dominate a different estimator 
x-2 if the MSE of Xi is lower than that of X2, for all values 
of x. If the MSE of X\ is never higher than that of £2, and 
is strictly lower for at least one parameter value, then X\ is 
said to dominate x%. An estimator is said to be admissible if 
it is not dominated by any other estimator. Surprisingly, when 
the parameter vector contains three or more elements, the LS 
method turns out to be inadmissible, i.e., some techniques 
always achieve lower MSE [6]. Thus, it is of interest to 
characterize the class of admissible estimators, and to find 
techniques which dominate LS. 

The study of admissibility is sometimes restricted to linear 
methods x = Gy. A linear admissible estimator is one which 
is not dominated by any other linear strategy. A simple rule 
characterizes the class of linear admissible techniques [7], 
and, given any linear inadmissible estimator, it is possible 
to construct a linear admissible alternative which dominates 
it [8]. However, the problem of admissibility is considerably 
more intricate when the linearity restriction is removed; gen- 
erally, admissible estimators are either trivial (e.g., x = 0) 
or exceedingly complex [9], [10]. As a result, much research 
has focused on finding simple nonlinear techniques which 
dominate LS. 

Early work on LS-dominating strategies considered the in- 
dependent, identical-distribution (i.i.d.) case, for which H = I 
and the noise is white. Among these, the James-Stein estimator 
[5], [11] is the best-known example; others approaches include 
the works of Stein [6] and Thompson [12]. Various "extended" 
James-Stein methods were later constructed for the general 
(non-i.i.d.) case [13]— [16]. Of these, Bock's technique [13] 
is quoted most often [16], [17]. However, none of these 
approaches has become a standard alternative to the LS 
estimator, and they are rarely used in practice in engineering 
applications [16]. Perhaps one reason for this is that some of 
the estimators are poorly justified and seem counterintuitive, 
and as such they are sometimes regarded with skepticism (see 
discussion following [18]). Another reason is that many of 
these approaches (including Bock's method) result in shrink- 
age estimators, consisting of a gain factor multiplying the LS 
estimate. Shrinkage techniques can certainly be used to reduce 
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MSE; however, in the non-i.i.d. case, some measurements are 
noisier than others, and thus a single shrinkage factor for 
all measurements can be considered suboptimal. Furthermore, 
in some applications, a gain factor has no effect on final 
system performance: for example, in an image reconstruction 
problem, multiplying the entire image by a constant does not 
improve quality. 

In this paper, we provide a framework for generating a wide 
class of low-complexity, LS-dominating estimators, which are 
constructed from a simple, intuitive principle, called the blind 
minimax approach [19], [20]. This method is used as a basis 
for selecting and generating techniques tailored for given 
problems. Many blind minimax estimators (BMEs) reduce to 
Stein-type methods in the i.i.d. case, and they continue to 
dominate the LS solution in the general, non-i.i.d. case as 
well. Thus, we show analytically that the proposed technique 
achieves lower MSE than LS, when an appropriate condition 
on the problem setting is satisfied. Unlike Bock's approach, 
BMEs may be constructed so that they are non-shrinkage, 
which improves their performance. Furthermore, extensive 
simulations show that BMEs considerably outperform Bock's 
method. 

BMEs are based on linear minimax estimators over a 
bounded parameter set [21], [22]. These are linear methods 
designed for a slightly different problem, in which the pa- 
rameter is known to belong to a given set. The minimax 
approach has been thoroughly studied in this setting, and 
closed-form solutions are known for many types of sets. In 
our case, however, no prior information about the parameter 
set is assumed. Instead, the blind minimax approach makes 
use of a two-stage process (Section[II]i: First, a set is estimated 
from the measurements; next, a minimax method for this set 
is used to estimate the parameter itself. The result may be 
viewed as a simple decision rule, independent of this two-stage 
construction process. Indeed, our LS-dominance proofs do not 
rely on the method by which the techniques are generated. 
In particular, the dominance results do not depend on the 
parameter actually lying within the estimated set. Thus, the 
blind minimax technique provides a framework whereby many 
different estimators can be generated, and provides insight into 
the mechanism by which these techniques outperform the LS 
approach. 

BMEs differ in the method by which the parameter set 
is estimated. In Section [Till we study the case in which the 
estimated set is a sphere; Section ITVl derives estimators based 
on an ellipsoidal parameter set. Section [V] demonstrates that 
several existing Stein-type methods can also be derived in 
the blind minimax framework. Section [VT] compares the blind 
minimax approach with LS regularization techniques, while 
in Section IVIII the BMEs are compared with other Stein- 
type decision rules. The paper concludes with a discussion 
in Section IVlITl 

Throughout this paper, vectors are denoted by lowercase 
boldface letters, and matrices by uppercase boldface letters. 
The ith component of a vector v is written as Wj. T 1 / 2 
indicates the (unique) positive semidefinite square root of a 
positive semidefinite matrix T. The notation u ~ A/^,(tt,Q) 
signifies that it is a random vector of length p, distributed 



normally with mean u and covariance Q. ||a;|| 2 is the Eu- 
clidean norm x*x, and \\x\\\ is the T-norm x*Tx, where T 
is a positive definite matrix. Finally, diag(ai, . . . , a n ) refers 
to the n x n diagonal matrix whose diagonal elements are 
flti, . . . , o n . 

II. Blind Minimax Estimation 

Consider the problem of estimating an unknown determin- 
istic parameter vector x 6 C m from measurements y G C™ 
given by 

y = Hx + w (1) 

where H G C" xm is a known matrix and w is a Gaussian 
random vector with zero mean and covariance C w . For 
simplicity, we assume that H is full-rank and that C w is 
positive definite. 

The standard solution to this regression problem is the LS 
approach 

x LS = (Il*C w 1 Il)- 1 Il*C w 1 y. (2) 

The MSE of x^s does not depend on the value of x, and is 
given by 

e = E{\\x LS -x\\ 2 } =Tv(Q- 1 ) (3) 

where 

Q = ITC^H. (4) 

Despite the popularity of the LS method, other estimators 
are known to achieve lower MSE. We propose a novel strategy 
leading to such LS-dominating techniques, namely, the blind 
minimax approach. To illustrate this concept, suppose for a 
moment that x is known to lie within a compact parameter 
set S. In this case, a linear minimax estimator over the set S 
may be constructed [8], [21], [22]. This is the linear estimator 
xm = Gy minimizing the worst-case MSE among all possible 
values of x in S, 

xm = are min max E{ \\x — x\\ 2 \ . (5) 

x=Gy xeS U ' 

A closed form solution of (O has been previously derived for 
many cases of interest. Furthermore, it has been shown that 
any linear minimax estimator achieves lower MSE than that 
of the LS method, for all values of x in S [8], [19]. Thus, 
as long as some bounded set is known to contain x, minimax 
techniques outperform the LS estimator. 

BMEs utilize minimax estimators when no parameter set is 
known. This is done in a two-stage process: 

1) A parameter set S is estimated from the measurements; 

2) A minimax estimator designed for S is used to estimate 
the parameter vector x. 

Various methods for estimating the parameter set S can be 
used, resulting in a variety of BMEs. In this paper, we consider 
sets of the form {x : x*Tx < L 2 }. In the next section, 
we examine the case T = I, in which the parameter set is 
spherical, resulting in a shrinkage estimator. Subsequently, in 
Section IIV1 we discuss the more general case in which T = 
(H*C u , 1 H) b for some real number b. In both cases, closed 
forms are provided, and dominance over the LS method is 
demonstrated. 
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III. The Spherical Blind Minimax Estimator 

In this section, we apply the blind minimax technique using 
a spherical parameter set S whose radius L will be estimated 
from measurements. We assume for now that the sphere is 
centered on the origin, S = {x : \\x\\ 2 < L 2 }. For a given 
value of L, the linear minimax estimator is [22] 



x M = 



L 



L 2 



eo 



-XLS, 



(6) 



where £ls is the LS estimator (0 and eo is the MSE of 
ccls- The resulting spherical BME (SBME) will have the form 
(0, where L 2 is estimated from the measurements. 

As an estimate of L 2 , we seek a value as close as possible 
to ||£c|| 2 : a smaller value would exclude the true vector x 
from the parameter set, while a larger value would yield an 
overly conservative estimator. Since x is unknown, a natural 
alternative is to use Als instead. Thus, we propose to estimate 



L 2 as 
by 



|x L s|| 2 . Substituting into (0, the SBME is then given 



XSBM 



x L s| 



X L s| 



eo 



-XLS- 



(7) 



In the i.i.d. case, the SBME reduces to the well-known 
Thompson estimator [12]. Under suitable conditions, Thomp- 
son's technique is known to strictly dominate the LS estimator, 
meaning that it achieves lower MSE for all values of x [23]. 
However, the SBME is equally well-defined for the non-i.i.d. 
case. As we shall see, the SBME strictly dominates LS in the 
non-i.i.d. case, and can thus be viewed as a generalization of 
Thompson's results. In Section[V]we will demonstrate that the 
blind minimax approach can be used to derive generalizations 
of additional well-known methods, including Stein's estimator. 

Up to this point, we have arbitrarily chosen the parameter set 
to be centered on the origin. The result was a weighted average 
between the LS estimate and 0. Averaging with a constant 
value may be viewed as a restraint, which lessens the effect 
of measurement noise. As we shall see, the proposed BMEs 
outperform the LS estimator. This result demonstrates the fact 
that the LS approach results in an overestimate: reducing the 
norm of ccls improves its performance. However, the choice of 
a parameter set centered on the origin is completely arbitrary; 
BMEs may be constructed around any constant center point Xq 
[17]. This will result in a weighted average between Als and 
xq, which may be useful if the parameter vector is expected 
to lie near a particular point. Thus, the "off-center" SBME is 
given by 



x = 



\ X LS I 



X L S 



eo 



XLS 



CO 



XLS 



eo 



x . 



(8) 



All dominance results continue to hold for the off-center 
techniques as well. In the sequel, we assume xq = merely 
for the sake of notational simplicity. 

The following theorem demonstrates that the SBME is 
guaranteed to outperform LS in terms of MSE. 

Theorem 1: Suppose eo/e max > 4, where e is given by 0, 
e max is the largest eigenvalue of Q _1 , and Q = H*C tl) 1 H. 
Then, the SBME (0 strictly dominates the LS estimator. 



The value e /e max is known as the effective dimension [16], 
and may be roughly described as the number of independently- 
measured parameters in the system. In the i.i.d. case, for 
example, the effective dimension simply equals the length 
of the vector x. Thus, the condition of Theorem Q] can be 
roughly stated as a requirement for a sufficient number of 
independent parameters. This requirement is a result of the fact 
that the LS estimator is admissible when up to two parameters 
are estimated [6]. However, since many estimation problems 
contain dozens or hundreds of parameters and measurements, 
the requirement on the effective dimension holds for a variety 
of applications. 

Note that the SBME is a special case of the estimator 



X r . = 1 - 



XLS, 



(9) 



C+ ||XLS 

in which c — e^. Thus, rather than proving Theorem [T] we 
prove the following, more general proposition, which will also 
be used in Section [V] 

Proposition 1: Under the conditions of Theorem Q] the 
estimator x c given by (0 strictly dominates the LS estimator, 
for any c > 0. 

The proof of Proposition Q] makes use of the following 
lemma, which is due to Stein [5, Theorem 1.5.15]. 

Lemma 1 (Stein): Let i) ~ J\f p (v,I), and let g(v) be a 
differentiable function such that E 
Then, 

iW - -**•><« 

Proof: [Proof of Proposition [T) To prove the proposition, 



{\W\} 



< oo for all i. 



(10) 



first note that the MSE R(x c ) = E{\ 
given by 



! } of 



R(x c 



eo 



E 



egllxLsli 2 

(C+||XLS|| 2 ) 2 

eo 



2E 



|x LS 



T2 LS 



(x - X LS ] 



(11) 



Let VSV* be the eigenvalue decomposition of Q, such 
that V is unitary and S = diag(o"i, . . . , cr m ). Define v = 
V*Q 1 / 2 5;ls and v = V*Q 1 / 2 a;. With these definitions, we 
have 



v S v 



XlsX, 



|x L s 



(12) 



^S-^HIXLSI^-!. 

Using these properties, the third term in (fTTT l becomes 

fo 



E 



XLS 



rXLs(x-XLs) 



= E 



CO 



eot^E 

2=1 



V 

VijVj - Vj) 



To evaluate (00, let 

9i{v) 



(13) 



(14) 
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and note that v is distributed normally with mean v and 
covariance I. We can thus apply Lemma Q] to obtain 



XLS 



E 



eo 



C+ 1 1 XLS 

i 

Tr(E 



1 



2 . T 1 * 



= -e E 



c + v *£ _i t> 
TrtQ- 1 ) 



-l 

2e £ 



2e E< 



(c + ^ST 1 *)) 2 

II & Ls||q-i 1 
(C+H&LSII 2 ) 2 [ 



(15) 



Substituting this result back into ( fTTT i. we have 



i?(x c ) = e + £ 




Since c > 0, 



R(x c ) <e Q + E 



eo 



C+ XLS 



(-e + 4e max ) 



(16) 



(17) 



If eo > 4e max , then the expectation is taken over a strictly 
negative range, and hence R(x c ) is always lower than eo, so 
that x c strictly dominates &ls- ■ 
As we have shown, in terms of MSE, the SBME outper- 
forms LS, providing us with a first example of the power of 
blind minimax estimation. The SBME is a shrinkage estimator, 
i.e., it consists of the LS estimator multiplied by a gain factor 
smaller than one. The SBME thus illustrates the fact that the 
LS technique tends to be an overestimate, and shrinkage can 
improve its performance. 

IV. The Ellipsoidal Blind Minimax Estimator 
A. Motivation 

Not all elements of the least-squares estimate &ls are 
equally trustworthy. Rather, cels is a Gaussian random vector 
with mean x and covariance Q 1 = (H*C lu 1 H) . Thus, 
some components of &ls have lower variance than others. In 
this sense, the scalar shrinkage factor of the SBME (|7]i and 
other extended Stein estimators [13] seems inadequate. 

Indeed, several researchers have proposed shrinking each 
measurement according to its variance. Efron and Morris 
[14] propose an empirical Bayes technique, in which high- 
variance components are shrunk more than low-variance ones. 
However, no closed form is available for this estimator, and 
obtaining an estimate requires iteratively solving a set of 
nonlinear equations. Furthermore, it is not known whether this 
method dominates LS. By contrast, Berger [15] provides an 
estimator in which more shrinkage is applied to low-variance 
measurements, despite the fact that low-noise components are 
those for which the LS approach is most accurate. Berger's 
technique is constructed such that the shrinkage of all compo- 
nents is negligible whenever there is a substantial difference 



Amax(T) 



Amin(T) 




Fig. 1 . Illustration of the adaptive shrinkage of the minimax estimator xm for 
the parameter set x*Tx < L 2 . Low shrinkage is applied to components of 
xlq corresponding to small eigenvalues of T, while components in directions 
of large eigenvalues obtain higher shrinkage. 



between the variances of different components. As a result, 
dominance over the LS method is guaranteed, but the MSE 
gain is insubstantial unless all noise components have similar 
variances. 

Minimax estimators can easily be adapted for non-scalar 
shrinkage. Specifically, consider an ellipsoidal parameter set 
of the form S = {x : \\x\\^ < L 2 }, for some positive definite 
matrix T (see Fig. [TJ. Let xm represent the linear minimax 
estimator for this set. It can be shown that % is a linear 
function of ±ls, an d one can therefore examine its effect on 
each component of &ls- Consider first components of Xls 
in the direction of narrow axes of the ellipsoid S. These 
components correspond to large eigenvalues of T, and are 
denoted A max (T) in Fig.Q] The parameter set imposes a tight 
constraint in these directions, and there will thus be consider- 
able shrinkage of these elements. By contrast, components in 
the direction of wide axes of S (small eigenvalues of T) are 
not constrained as tightly. Less shrinkage will be applied in 
this case, since the LS method is the linear minimax estimator 
for an unbounded set. In Fig. Q] the shrinkage of wide-axis 
and narrow-axis components is illustrated schematically for a 
particular value of ccls- 

Typically, one would want to obtain higher shrinkage for 
high-variance components. Since the covariance of &ls is 
Q _1 , we propose a BME based on a parameter set of the 
form 



S = {x 



x 



<L 2 } 



(18) 



for some constant b < 0. The bound L 2 is estimated as 
L 2 = ||xLs||qb- We refer to the resulting technique as the 
ellipsoidal BME (EBME). Note that highly negative values of 
b yield an eccentric ellipsoid, and hence result in a larger 
disparity between the shrinkage of different measurements. 
Contrariwise, a choice of b = yields scalar shrinkage, 
and the resulting estimator is identical to the SBME. As 
we will demonstrate, the EBME dominates the LS method 
under a condition similar to that of the SBME. However, 
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the dominance condition of the EBME becomes stricter as b 
becomes more negative. Thus, there exists a tradeoff between 
selective shrinkage and a broad dominance condition. In the 
numerical examples below we will choose a value of b = — 1 
as a compromise. 

As an additional motivation for the use of the EBME, 
consider the following application example (Fig. |2). Here, a 
100-sample signal is to be estimated from measurements of its 
discrete cosine transform (DCT). Each component of the DCT 
is corrupted by Gaussian noise: high-variance noise is added 
to the 10 highest-frequency components, while the remaining 
components contain much lower noise levels. Thus, C w is 
diagonal, and H is the DCT matrix. The condition number of 
H-C- 1 !! is 1000. 

Since C w is diagonal, the LS estimator is equivalent to 
an inverse DCT transform, and thus ignores the differences 
in noise level between measurements. This causes substantial 
estimation error, as observed in Fig. |2(a)| The error is reduced 
by the SBME (Fig. |2(b)| i, which multiplies the LS estimate 
by an appropriately chosen scalar; in the example above, the 
squared error was reduced by 20% compared with that of 
the LS estimate. Hence, merely multiplying the result of the 
LS technique by an appropriately chosen scalar can signifi- 
cantly reduce estimation error. However, the most significant 
advantage is obtained by the EBME (Fig. |2(c)) , which shrinks 
the high-noise coefficients. Specifically, in this example, the 
choice b = — 1 resulted in shrinkage of 0.44 for the high- 
noise coefficients, and shrinkage of only 0.98 for low-noise 
coefficients. The resulting squared error was 83% lower than 
that of the LS estimate. 

Thus, our preliminary example demonstrates that it is pos- 
sible to achieve substantial improvements over the LS tech- 
nique by using non-scalar shrinkage. As we will demonstrate 
presently, this empirical finding is only an example of the wide 
range of cases in which the EBME is guaranteed to improve 
on the LS approach. 

B. Dominance 

We begin our analysis by obtaining an expression for 
the EBMEs. A closed form solution for minimax estimators 
of an ellipsoidal parameter set was developed in [22]. By 
substituting the value of L 2 into this closed form, we obtain 
the following result. 

Proposition 2 (Closed-Form EBME): Let VSV* be the 
eigenvalue decomposition of Q = H*C tl) 1 H, where V is 
orthonormal and £ = diag(<7i, . . . , er m ). Let b G K be any 
constant, and suppose the eigenvalues S are ordered such that 
o\ > cr| > • • ■ > a b m > 0. Then, the EBME for the parameter 
set S = {x : ||se||q 6 < L 2 } with L 2 = ||&ls||q6 is given by 

£ebm = Vdiag m - acr b 1 /2 ) + , . . . , (1 - aa b J 2 ) + \ V*x hs 

(19) 

when Als 0, and by &ebm = when x^s = 0. Here 

(•)+ = max(-,0), 

ri 



= E - 



6/2-1 



Tl = 



i=k+l 
m 

E ■ 

i=k+l 



.6-1 



(20) 



1 such 



and k is chosen as the smallest index < k < m 
that 

aat^ < 1. (21) 
Proof: In the case x^s — 0, we need to find the linear 
minimax estimator for the set S = {0}. Clearly, the solution 
in this case is x = 0. For all other values of &ls> we see k the 
linear minimax estimator for the set S = {x : x*Q b x < L 2 }, 
where L 2 = AlsQ^ls > 0. Substituting this value of L 2 
into Proposition 1 of [22] yields 




,l)V*(I-aQ 6 / 2 )x LS 

.,l-aa b l 2 )V*x LS . 

(22) 

From (f2TT >. it follows that 1 — aa^ 2 < for all i < k, and 
therefore d22l can be written as ( fT~9b . ■ 
We note that, as long as ||&ls||q6 > 0, it is always possible 
to find a value k which satisfies (121b . In particular, for k = 
m — 1, we have 



6/2-1 



.6-1 



< 



6/2-1 



.6-1 



(23) 



H^LS ||qb ~r u m u m 

which satisfies the requirement (l2TT i. 

While the closed form of the EBME appears somewhat 
more intimidating than that of the SBME, the computational 
complexities of the two estimators are comparable. The major 
difference is the calculation of the value k, for which m divi- 
sions are required. Like the SBME, the EBME also dominates 
the LS estimator under suitable conditions, as shown in the 
following theorem. 

Theorem 2: Let &ebm be the EBME ( fT9l and suppose that 

Tr(Q f,/2 " 1 ) > 4A max (Q 6 / 2 - 1 ) (24) 



I^Lsllqb +r 2 



where A ma x(Q f> ^ 2_1 ) is the largest eigenvalue of Q 6 / 2-1 
and Q = H*C t „ 1 H. Then, ceebm strictly dominates the LS 
estimator. 

Note that by substituting 6 = 0, this result can be used to 
demonstrate the dominance of the SBME over LS estimation 
(Theorem [T). However, the method of proof here is different, 
and the proof of Theorem Q] will also be used in Section [V] 

Also note that the dominance condition d24l) is satisfied by 
many reasonable estimation problems. Assuming a sufficient 
number of parameters, the only case in which this condition 
does not hold is the situation in which a small number of 
parameters (less than four) have much higher variance than 
all other parameters; in this case, the LS method is admissible 
or nearly so. 

In order to prove Theorem [2] we observe that the form 
(O of the EBME is similar to Baranchik's positive-part 
modification [5], [24] of the James-Stein estimator. Baranchik 
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•4 1 ' ' 1 ' 1 -4 1 ' 1 ' 1 1 -4 1 1 ' 1 1 1 

20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 

(a) (b) (c) 



Fig. 2. Estimation of a signal from measurements of its DCT. In this example, high-frequency components have a much higher noise variance than low- 
frequency components. Dashed line indicates original signal; solid line indicates estimate, (a) LS estimate; (b) Spherical BME, resulting in a shrinkage factor 
of 0.79; (c) Ellipsoidal BME, with shrinkage in the range 0.44-0.98. 



proposed using a shrinkage factor of whenever the James- 
Stein technique contains negative shrinkage, and showed that 
the resulting method dominates the James-Stein estimator. 
Although the EBME is not a shrinkage technique, it resembles 
Baranchik's modification, since each negative diagonal com- 
ponent in ( fT9] l is replaced with zero. The following proposition 
shows that the MSE can be reduced by eliminating this 
negative shrinkage. 

Proposition 3: Let VSV* be the eigenvalue decomposi- 
tion of Q = H*C TO 1 H, and let b e R be a constant. Suppose 
x is an estimator of the form x — VDV*sls, where D is 
a diagonal matrix, whose diagonal elements di are functions 
of the random variable ||£Cls||q6- Suppose at least one of the 
elements di is negative with nonzero probability. Then, x is 
dominated by the (generalized) positive-part estimator 

x+ = VD+V^ls, (25) 

where D + is a diagonal matrix with diagonal elements d i+ = 
max(0, di). 

Proof: Our proof follows that of Baranchik [24]. We will 
show that MSE(cc) — MSE(i;+) is nonnegative for all x, and 
positive for any value of x whose elements are all nonzero. 

MSE(a;) - MSE(i + ) = E{\\x - x\\ 2 } - E{ \\x + - x\\ 2 } 
= E{\\xf - \\x+\\ 2 } - 2E{x*x - x* + x} 
= E{xt 3 V(D 2 -T>l)V* XLS } 

-2E{xl s V(D -r>+)V*x} . (26) 

Since d\ — df + > for all i, the first term in d26l i is 
nonnegative. Hence, to prove the proposition, it suffices to 
show that _E{:El S V(D — D+)V*:r} is nonpositive for all x, 
and negative for values x with nonzero elements. 

To this end, define z = V*x and z = V*x^a,- We note 
that z ~ M m (z, X" 1 ), so that the elements of z are statisti- 
cally independent. To calculate £'{i;LgV(D — D + )V*a;}, we 



condition on j &ls Hq 6 ' obtaining 

£?{*£ s V(D-D+)V*x} 

= e{e{z*(D - D + ) 2 |i*S b £}} 

= E\^{d l -d l+ )E{z l z l \z*Y! } z}^ (27) 

where we used the fact that HklsIIq* = z*H b z, and that di 
and di+ are deterministic when conditioned on |[sbls||q&. F° r 
each i, we further condition on \2i\, to obtain 

E{x* LS V(T> - B + )V*x} 

= £;|£(d i -d i+ )s{^|rs i, z,|^|}| 

{m 1 
^2(d z - d t+ )\z i z l \E^sgn(z t z l )\z*'E b z, |£ ( |} I . 

(28) 

Given \2i\, we have that either ii = \2i \ sgn(zi) or that Zi — 
— \zi \ sgn(zi). It is evident from the pdf of Zj that the latter 
option has lower probability, i.e., 

Prjsgn(ii) = sgn(z 4 )|2:*£ b ,z, \zi\\ 

> Pr{sgn(i 4 ) ± sgn(z i )|z*S 6 £, |z 4 |} . (29) 

It follows that E^sgn(z l z i )\z*'E b z, \zi\\ > 0, with strict 
inequality for z% ^ 0. Therefore, all terms in (l28l l are 
nonnegative, except for (di — which is nonpositive. As 
a result, (|28| > (and hence d26l i) is nonpositive for all x, so that 
the MSE of x + is never higher than that of x. 

We must also show that, for some x, (f28b is strictly 
negative. To this end, we choose x for which all elements are 
nonzero; as a result, all terms in (|28T > are strictly positive with 
probability 1, except for (di — di + ). The latter term is negative 
when di < and zero otherwise. Since di is negative with 
nonzero probability for at least one value of i, we conclude that 
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for the chosen value of x, d28l l is strictly negative, completing 
the proof of Proposition [3] ■ 
This generalization of the concept of a positive part estima- 
tor is now used to prove Theorem [2] 

Proof: [Proof of Theorem |2) Clearly, the EBME (HUl is 
the positive part of the estimator 

xo = Vdiag - aa\ /2 , . . . , 1 - aer b { 2 ) v **ls 

= (I - aQ b / 2 )x LS . (30) 

Therefore, it suffices to show that x dominates the LS 
estimator, and the theorem follows using Proposition [3] 
The MSE of Xq is given by 



E 



x - x hs + aQ b/2 x- LS 



E< 



riQ b/2 a:LS 



x 



XLS 



\xls\ 



Q'- 



- r 2 



El 



2E< 



rl\\*L S \\ 2 Qb } 

JpLs||^+r 2 ) 2 / 

ri(x - x hS )*Q b / 2 x hS 



\xls\ 



>'2 



To analyze this expression, we define 

v^V*Q 1 / 2 x 7 
i, 4 V*Q^ 2 x LS . 

Using this notation, the third term in (|3TT > becomes 

rx(x - &Ls)*Q b/2 &Ls I 



(31) 



(32) 



A* ± El 



E< 



T-2 



ri 



(x - x LS )*q 1 ^vv*q b ^- 1 vv*Q 1 / 2 x LS 

iL S Q 1/2 VV*Q b - 1 VV*Q 1 /2i; Lg + r 2 




Next, define 



9i{v) 



riVi 



6-1, 



'2 



(33) 



(34) 



Note that r\ and r 2 are implicitly dependent on k, which in 
turn depends on i). Thus, gi(v) is discontinuous for some 
values of v, namely, those values for which a — chj 2 . 
However, these values of v occur with probability zero; for 
all other values, k (and hence T\ and r 2 ) are constant for 
sufficiently small changes in v. Thus, 



dvi 



2r lC r, 



6-1^2 



6-1, 



'"2 



6-1, 



r 2 



w.p. 1 (35) 



and E{\dgi/di!j\} < oo for all i, j. Furthermore, observe that 
v ~ A/' m (f,I). We can therefore apply Lemma [T] to gi. This 



yields 



E 



ribiivi - Vj) 

^6-1 



El 



n 



~* S 6-1- +? , 2 



'(t)*S b - 1 i) + r 2 ) 2 



(36) 



Substituting into Q3T ). we obtain 



6/2-1 



El 



(v*-£"- L v + r 2 ) 2 i>*i: b - L v 



r-2 



E< 



El 



m 36/2-2 ~2 



IT 



Em 
i=i CT 



m 6/2-1 



(37) 



Using the definition d32l of v, A3 may be written as 



El 



ri 



{ \\ x LS\\ Qb 

Note that 



2||xls| 



I^lsIIq!, + ?*2 



(38) 



£ls| 



Q36/2- 



^LSllqt, + ?'2 



< 



els 



Q36/2- 



\xls\ 

)6/2 



Q'' 



_ (Q fa/2 & L s)*Q fa / 2 - 1 (Q b/2 ^Ls 

(Q b / 2 i LS )*(Q b / 2 iLs) 
< A^^Q"/ 2 - 1 ). 



(39) 




2A max (Q b / 2 - 1 )-Tr(Q b / 2 - 1 ) 



Substituting back into OH), we have 



(40) 



MSE < e + El 



ri 



I^Lsllqb +r 2 



■ ri +4A max (Q fc / 2 - 1 ) - 2Tr(Q b / 2 - 1 )] j (41) 

and using the fact that n < Tr(S b/2_1 ) = Ti^Q 6 / 2 - 1 ), we 
conclude that the MSE is bounded by 



eo + £ 



Q" 



r 2 



4A max (Q b / 2 - 1 )-Tr(Q 



6/2-1' 



(42) 

Thus, if TiiQ^ 2 - 1 ) > 4A max (Q b / 2 ~ 1 ), then MSE < e , 
proving that the EBME dominates the LS estimator. ■ 
Thus far, we have presented two examples of BMEs which 
dominate the LS method under suitable conditions. Both 
approaches are extensions of Thompson's technique to the 
non-i.i.d. case. In the next section, we demonstrate that other 
BMEs extend different LS-dominating techniques, namely 
Stein's estimator and Baranchik's positive-part improvement. 



x 



V. Relation to Stein-type Estimation 

In Section [Till the SBME (|7]) was constructed by using 
L 2 = H^lsII 2 as an estimate of |jcc|| 2 . However, the fact that 
shrinkage techniques such as the SBME dominate LS indicates 
that &ls is in fact an overestimate of a;. It is arguably more 
accurate to use a smaller value than \\xis 
In particular, it is readily shown that 

|2| _ ||_||2 



to estimate \\x\ 



E{\ 



xls\ 



x 



CO- 



(43) 



Hence, one may opt to use 

L 2 = IIxlsI 



C() 



(44) 



as an estimate of ||cc|| 2 . It is important to note that such a 
value of L 2 cannot be used with the linear minimax method, 
since L 2 is negative with nonzero probability; a parameter set 
with negative radius is undefined. However, substituting ( |44| > 
into a minimax technique, as per the blind minimax approach, 
can still lead to well-defined estimators. In particular, substi- 
tuting (l44t into the spherical minimax method (0 yields the 
"balanced" BME 



^BBM 



1 



CO 



I^LSl 



XLS- 



(45) 



A striking property of the balanced BME is that it reduces to 
Stein's estimator [6] in the i.i.d. case. Both techniques are well- 
defined unless cels — 0, an event which has zero probability. 
Furthermore, the balanced BME extends Stein's method, in 
that it continues to dominate LS for the non-i.i.d. case, under 
suitable conditions. This is shown by the following theorem. 

Theorem 3: Suppose eo/e m ax > 4, where eo is given by 
®, e max is the largest eigenvalue of Q \ and Q is given by 
©. Then, the balanced BME (05]) strictly dominates the LS 
estimator. 

Proof: The theorem follows by substituting c = in 
Proposition Q] ■ 
A well-known drawback of Stein's approach is that it 
sometimes causes negative shrinkage, i.e., the shrinkage factor 
in (|43T > is negative with nonzero probability. This is known to 
increase the MSE [24]. From the blind minimax perspective, 
this negative shrinkage is a result of the fact that L 2 can 
become negative. Thus, it is natural to replace ( |44T > with 



L 2 = 



\xls\ 



(46) 



where (a)+ = max(a, 0). Substituting this value of L 2 into the 
spherical minimax estimator yields the "positive-part BME," 
given by 

( (II^lsII 2 - e )+ A „ 

3-PBM = ,1,' no s ~ X hS . (47) 



eo 



CO 



Note that when ||ccls|| 2 — e < 0, the estimator £Cpbm equals 
0; in all other cases, ccpbm = ^bbm- Thus, may be 
written as 

e 



SPBM = I 1 — 



I^LS 



(48) 



In other words, spbm is the positive part of the balanced 
BME. Specifically, in the i.i.d. case, i;pBM is the positive-part 
correction of Stein's estimator. In the i.i.d. case, Baranchik 
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Fig. 3. Comparison between the positive part approach and the SBME. 
The positive part method results in stronger shrinkage, which improves 
performance for low SNR at the expense of high SNR. 



[24] demonstrated that ipBM dominates ccbbm- An interest- 
ing question for further research is whether the dominance 
property holds in the non-i.i.d. case as well. 

The "balanced" method presented in this section for es- 
timating the parameter set radius results in a value (l44t of 
L 2 which is smaller than that of the SBME. As a result, the 
balanced approach causes more shrinkage towards the origin. 
This tends to improve performance for low signal-to-noise 
ratio (SNR) at the expense of performance degradation for 
high SNR. In particular, Apbm has a positive probability of 
yielding an estimate of 0. This may indeed reduce the MSE 
when the parameter is exceedingly small with respect to the 
noise variance, but will sacrifice high-SNR performance. 

In Fig. [3] the positive part estimator ipBM is compared 
with the SBME of Section [ill] The problem setting of this 
simulation is identical to that of Fig. |5(a)[ which will be 
described in detail in Section \VU\ In general, the positive-part 
BME tends to perform as well or worse than the SBME at SNR 
values above dB, and better for lower SNR values. Thus, in 
most applications, use of the SBME is probably preferable. 
However, the fact that Stein's estimator can be derived and 
extended using blind minimax considerations illustrates the 
versatility of this approach. 

VI. Comparison with LS Regularization 

Independently of the development of Stein-type estimators, 
many researchers became aware of deficiencies of the LS 
approach for solving ill-conditioned problems. A variety of 
alternatives were proposed as a result. These substitutes were 
generally not required to dominate the LS estimator; rather, 
they were intended to improve estimation quality in specific 
scenarios. Of these approaches, the most common is Tikhonov 
regularization [25], also referred to as ridge regression [26]. 

Tikhonov regularization is intended for ill-posed problems, 
i.e., problems in which H*C U! 1 H is nearly singular. The 
matrix Q = H*C U! 1 H is guaranteed to be positive-definite 
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(and hence invertible), since we assume that H is full-rank and 
C w is positive-definite. However, Q may contain eigenvalues 
which are very close to zero. In these cases, the LS estimator 
(which depends on the term Q _1 ) causes severe amplification 
of measurement noise. In effect, an ill-posed setting is one in 
which the SNR of at least one parameter is extremely low; 
as we have seen, the LS approach results in overestimation in 
such conditions. Regularization techniques attempt to mitigate 
this problem by improving the conditioning of the matrix Q. 

Tikhonov regularization may be justified in a Bayesian 
setting, as follows. Suppose that the parameter vector x is 
known to be distributed normally, independently of the noise 
w, with zero mean and a covariance matrix C x . The minimum 
MSE estimator of x given y is then the Wiener filter [1], [27] 

x = (itc^h + c- 1 ) -'irc-V (49) 

In practice, a; is a deterministic parameter, and thus does not 
have a covariance matrix. However, by replacing C~ 1 with an 
appropriately chosen regularization matrix, the (generalized) 
Tikhonov estimator is obtained. 

There are several methods for empirically selecting a regu- 
larization matrix C" 1 . If nothing is known about the parameter 
x, one possibility is to choose C x = a 2 I, where a 2 is to be 
estimated from y. Optimally, one would like to use the average 
value of x 2 as an approximation of the variance a 2 . However, 
since x is unknown, this is not possible. Instead, tr^ can be 
estimated as J2%ls i/ m > which is an approximation of the 
desired quantity x 2 jm. This results in the estimator 

-l 



H'CT/H- 



-I 



h*c: 



y- 



(50) 



This derivation is based on an empirical Bayes approach 
[28], in which the elements of x are assumed to be i.i.d. 
An alternative is to assume instead that the variance of x 
is proportional to the variance of the noise w, which implies 
C x = aQ _1 . In analogy to the previous derivation, one may 
then estimate a as m/||xLs||Q- Substituting into d49l results 
in the shrinkage estimator 



^(2) _ 



2 
Q 

^LsHq 



(51) 



Unfortunately, the Tikhonov estimators xQ' and x^ do 
not dominate LS; like the original Tikhonov regularization, 
they perform poorly at high SNR values. To illustrate this, 
we performed a simulation in which the MSE of the LS 
method was compared to that of x^ and x^\ In this 
example, 15 parameters were estimated using 15 independent 
measurements, with H = I. The noise variance of five of the 
measurements was 100 times larger than the noise variance 
of the remaining measurements. The parameter vector was 
chosen in the direction of a high-variance measurement, and 
its magnitude was varied to obtain different SNR values. Here 
and in the remainder of the paper, we define the SNR as 



SNR 



x 



(52) 



E{\\w\\2} Tv(C w )- 

For comparison, the MSE of the LS and blind minimax 
techniques were also calculated. 
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Fig. 4. Tikhonov regularization does not dominate the LS estimator. The 
Tikhonov estimators £Et are seen to perform worse than the LS estimator at 
high SNR, whereas the BMEs dominate the LS method. 



The results are displayed in Fig. |4] It is evident from 
this figure that the Tikhonov regularization is inadequate at 
high SNR, as it performs worse than the LS estimator. Both 
Tikhonov approaches converge to the LS approach at infinite 
SNR, but consistently obtain higher MSE than the LS method 
for SNR values above 5 dB. This makes them unattractive 
candidates for replacing the LS technique. 

VII. Numerical Results 

Estimator performance depends on a variety of operat- 
ing conditions, including the effective dimension, the SNR, 
the eigenvalues of Q = H*C l „ 1 H, and the value of the 
unknown parameter vector x. Several computer simulations 
were implemented to test the effect of these conditions on 
performance of the SBME and EBME. In these tests, a value 
of & = —1 was used for the parameter set ( TT~8b of the 
EBME. The simulations were also used to compare the BMEs 
with Bock's estimator [13], which is the most commonly- 
used extended Stein estimator [16], [17]. Like Stein's results, 
Bock's approach consists of a shrinkage estimator, given by 



*Bock = I 1 ~ 



^o/^max 2 



3?LS- 



(53) 



The theorems of Sections [TTU and [TV] ensure that the BMEs 
achieve lower MSE than the LS estimator, but do not guar- 
antee that this improvement is substantial. To measure this 
performance gain, we first chose a typical scenario, in which 
the number of parameters m and the number of measurements 
n were both 15. The system matrix H was chosen as I, and 
the noise covariance was 

C w = a 2 diag(l, 1, 1, 1, .5, .2, .2, .2, .2, .1, .1, .1, .1, .05, .05) 

(54) 

resulting in an effective dimension of 5.8. Here a 2 was 
selected to achieve the desired SNR ( |52l . To illustrate the 
dependence on the value of the parameter vector x, two 
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different settings were tested. In Fig. |5(a)| x is chosen in 
the direction of the maximum eigenvector of Q _1 , while 
in Fig. |5(b)| x is chosen in the direction of the minimum 
eigenvector. This corresponds to parameters in the direction 
of maximal and minimal noise, respectively. Estimates of the 
MSE were calculated for a range of SNR values by generating 
10,000 random realizations of noise per SNR value. 

It is evident from Fig. [5] that substantial improvement in 
MSE can be achieved by using BMEs in place of the LS 
approach: in some cases the MSE of the LS estimator is nearly 
three times larger than that of the BMEs. The performance 
gain is particularly noticeable at low and moderate SNR. At 
infinite SNR, the LS technique is known to be optimal [1], and 
all other methods converge to the value of the LS estimate; as 
a result, performance gain is smaller at high SNR, although 
substantial improvement can be obtained even at 10-15 dB. 

To further compare the BMEs with Bock's estimator, an- 
other simulation was performed, in which a large set of 
parameter values x were generated for different SNRs. For 
each estimator, and for each SNR, the lowest and highest MSE 
were determined, resulting in a measure of the performance 
range for each estimator. This performance range is displayed 
in Fig.|6]for two different choices of C^, which are indicated 
in the figure caption. One may observe that both BMEs 
outperform Bock's estimator under nearly all circumstances. 
It is also interesting to note that while the MSE of the EBME 
is highly dependent on the value of the parameter value x, the 
performance of the SBME is fairly constant. This is a result 
of the symmetric form of the SBME. On the other hand, the 
EBME achieves considerably lower MSE for most values of 
the parameter vector. 

It is insightful to compare the performance of the SBME 
and EBME in Figs. [5] and [6] While the worst-case performance 
of the two blind minimax techniques is similar, the EBME 
performs considerably better for some values of x. This is a 



result of the fact that the EBME selectively shrinks the noisy 
measurements, whereas the SBME uses an identical shrinkage 
factor for all elements. If one measurement contains very little 
noise, the SBME is forced to reduce the shrinkage of all 
other measurements. The EBME, by contrast, can effectively 
reduce the effect of noisy measurements without shrinking 
the clean elements. As a result, the EBME is superior by far 
if x is orthogonal to the noisiest measurements, whence the 
selective shrinkage is most effective; its performance gain is 
less substantial when x is in the direction of high shrinkage, 
since in these cases, shrinkage is applied to the parameter as 
well as the noise. 

Another important advantage of the blind minimax ap- 
proach over Bock's estimator is that the latter converges 
to the LS technique when the matrix Q is ill-conditioned, 
i.e., when some eigenvalues are much larger than others. 
This is because the shrinkage in Bock's method d53l l is a 
function of 1/||xls||q. As a result, when x^s contains a 
significant component in the direction of a large eigenvalue of 
Q, shrinkage becomes negligible. Yet, in this case, shrinkage 
is still desirable for the remaining eigenvalues. This effect is 
demonstrated in Fig. [7] which plots the performance of the 
various approaches for matrices Q having condition numbers 
between 1 and 1000. Here, 10 parameters and 10 measure- 
ments are used, H = I, and the noise covariance is chosen 
such that the first five eigenvalues equal 1 and the remaining 
five eigenvalues equal a value v, which is chosen to obtain 
the desired condition number. For each condition number, 
a large set of values x are chosen such that the SNR is 
dB; as in Fig. [6j the range of MSE values obtained for 
each estimate is plotted. It is evident that Bock's estimator 
approaches the LS method for ill-conditioned matrices, despite 
the fact that shrinkage can still improve performance, as 
indicated by the performance of the SBME. The performance 
of the EBME improves relative to the LS estimator for ill- 
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Fig. 6. Range of possible MSE values obtained for different values of x, as a function of SNR. H = I for both figures, (a) m = n = 15, with eigenvalues 
of C w distributed uniformly between 1 and 0.01, resulting in an effective dimension of 7.6; (b) m = n = 10, with C w containing five eigenvalues of 1 and 
five eigenvalues of 0.1, resulting in an effective dimension of 5.5. 
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Fig. 7. Range of possible MSE values obtained for different values of x, as 
a function of the condition number of Q. SNR dB, m = n = 10. 



conditioned matrices, since the high-noise components are 
further reduced in this case. 

VIII. Discussion 

The blind minimax approach is a general technique for using 
minimax estimators in situations for which no parameter set 
is known. We considered an application of this concept to 
the Gaussian linear regression model. Two novel estimators 
were proposed: a technique based on a spherical parameter 
set, and one based on an ellipsoidal parameter set. In Sections 
HII] and [IV] these approaches were shown to dominate the 
LS method. Under fairly weak conditions, in any application 
which makes use of the LS estimator, the MSE performance 



can be improved by using a BME instead. Furthermore, in 
Section [V] we demonstrated that Stein's approach, as well as 
its positive part modification, can be derived and generalized 
using the blind minimax framework. 

It can readily be shown that the dominance condition 
of the SBME (Theorem [TJ is weaker than the dominance 
condition of the EBME (Theorem |2), i.e., the conditions for 
SBME dominance hold whenever the conditions for EBME 
dominance hold. The dominance condition of Bock's estimator 
[13] is still weakeiQ. This would seem to indicate that Bock's 
estimator is superior to the proposed estimators. Yet the results 
of Section [VTIl demonstrate that the opposite is true: the BMEs 
usually outperform Bock's estimator. This is true in particular 
for ill-conditioned problems, for which the LS estimator is 
notoriously inaccurate; for such problems, Bock's approach 
dominates the LS method by a negligible margin, whereas 
the BMEs achieve a significant performance gain. Thus, while 
dominance theorems are useful in providing sufficient condi- 
tions for improving on the LS estimator, they are ill-suited for 
comparing LS-dominating estimators. This conclusion is note- 
worthy since estimators are sometimes chosen by maximizing 
the range of conditions for which dominance is guaranteed. It 
seems that other analytical tools are required for comparing 
LS-dominating estimators. For example, it may be possible to 
prove that BMEs dominate Bock's estimator, for some problem 
settings. 

The choice between the different BMEs is application- 
dependent. As demonstrated in Section lVTIl the SBME reliably 
achieves constant performance for a variety of values of x, 
although the typical performance of the EBME is superior. The 

'A simple change to the SBME (adding —2 to the numerator) changes 
its dominance condition to that of Bock's estimator, without significantly 
affecting its performance. However, we have been unable to derive this 
modification using the blind minimax approach, and thus prefer the simpler 
form of the SBME used in the paper. 
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EBME is particularly well-adapted to ill-posed problems, in 
which some measurements are much more noisy than others. 
In such cases, the use of a single shrinkage factor for all mea- 
surements is clearly suboptimal. As a result, scalar shrinkage 
methods such as the SBME and Bock's technique often result 
in little improvement over the LS estimator, while the EBME 
is capable of selectively shrinking the noisy measurements, 
thus improving performance. 

The use of a componentwise shrinkage technique such as 
the EBME may be useful in additional contexts as well. 
In some applications, MSE minimization is only a nominal 
goal which approximates some other error criterion. In these 
cases, a shrinkage estimator has no impact on the actual 
objective. For example, if the vector x is an image which 
is to be reconstructed, its subjective quality is not affected 
by multiplying the entire estimate by a scalar. Likewise, in 
a binary receiver, the sign of x must be determined, but the 
sign does not change when the estimate is shrunk. In such 
applications, the SBME (and Bock's estimator) have no effect 
on the final result, whereas the EBME can be used to improve 
performance. 

IX. Conclusion 

In this paper, we presented the blind minimax strategy, 
whereby one uses linear minimax estimators whose parameter 
set is itself estimated from measurements. This simple concept 
was examined in the setting of a linear system of measure- 
ments with colored Gaussian noise, where we have shown 
that the BMEs dominate the LS method. Hence, in any such 
problem, the proposed estimators can be used in place of the 
LS approach, with a guaranteed performance gain. Apart from 
being useful in and of themselves, the proposed techniques 
support the underlying concept of blind minimax estimation. 
This concept can be applied to many other problems, such 
as estimation with uncertain system matrices, estimation with 
non-Gaussian noise, and sequential estimation. Use of the 
blind minimax approach in such problems remains a topic for 
further study. 

Stein's discovery of LS-dominating estimators, half a cen- 
tury ago, shocked the statistics community, and his results 
are still rarely used in practice. It is our hope that the 
blind minimax concept will provide additional support for 
such estimators, both by supplying an intuitive understanding 
of Stein's phenomenon, and by providing a wide class of 
powerful new estimators. 
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